study show
Lisa: Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning Attack
Recent studies show that Large Language Models (LLMs) with safety alignment can be jail-broken by fine-tuning on a dataset mixed with harmful data. For the first time in the literature, we show that the jail-break effect can be mitigated by separating two states in the fine-tuning stage to respectively optimize over the alignment and user datasets. Unfortunately, our subsequent study shows that this simple Bi-State Optimization (BSO) solution experiences convergence instability when steps invested in its alignment state is too small, leading to downgraded alignment performance. By statistical analysis, we show that the \textit{excess drift} towards the switching iterates of the two states could be a probable reason for the instability. To remedy this issue, we propose \textbf{L}azy(\textbf{i}) \textbf{s}afety \textbf{a}lignment (\textbf{Lisa}), which introduces a proximal term to constraint the drift of each state. Theoretically, the benefit of the proximal term is supported by the convergence analysis, wherein we show that a sufficient large proximal factor is necessary to guarantee Lisa's convergence. Empirically, our results on four downstream fine-tuning tasks show that Lisa with a proximal term can significantly increase alignment performance while maintaining the LLM's accuracy on the user tasks. Code is available at https://github.com/git-disl/Lisa.
AI's Next Frontier? An Algorithm for Consciousness
Some of the world's most interesting thinkers about thinking think they might've cracked machine sentience. And I think they might be onto something. As a journalist who covers AI, I hear from countless people who seem utterly convinced that ChatGPT, Claude, or some other chatbot has achieved "sentience." The Turing test was aced a while back, yes, but unlike rote intelligence, these things are not so easily pinned down. Large language models will claim to think for themselves, even describe inner torments or profess undying loves, but such statements don't imply interiority.
- North America > United States > California (0.15)
- Europe > Slovakia (0.05)
- Europe > Czechia (0.05)
- Asia > China (0.05)
'Sycophantic' AI chatbots tell users what they want to hear, study shows
Stanford University researchers found that AI chatbots reinforced existing beliefs, assumptions and decisions. Stanford University researchers found that AI chatbots reinforced existing beliefs, assumptions and decisions. 'Sycophantic' AI chatbots tell users what they want to hear, study shows Scientists warn of'insidious risks' of increasingly popular technology that affirms even harmful behaviour Turning to AI chatbots for personal advice poses "insidious risks", according to a study showing the technology consistently affirms a user's actions and opinions even when harmful. Scientists said the findings raised urgent concerns over the power of chatbots to distort people's self-perceptions and make them less willing to patch things up after a row. With chatbots becoming a major source of advice on relationships and other personal issues, they could "reshape social interactions at scale", the researchers added, calling on developers to address this risk.
- Europe > Ukraine (0.07)
- Oceania > Australia (0.05)
- North America > United States > California (0.05)
- Leisure & Entertainment > Sports (0.72)
- Government > Regional Government (0.51)
7bab7650be60b0738e22c3b8745f937d-AuthorFeedback.pdf
We thank all the reviewers for their valuable comments. We would be happy to revise our paper using their suggestions. We respectfully disagree with the statement that a "model can have large Lipschitz constant but can still be robust". We recreated our figures (below; which now align with Reviewer 1's intuition). Confidence and accuracy are directly related (Figure 1(d)).
Lisa: Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning Attack
Recent studies show that Large Language Models (LLMs) with safety alignment can be jail-broken by fine-tuning on a dataset mixed with harmful data. For the first time in the literature, we show that the jail-break effect can be mitigated by separating two states in the fine-tuning stage to respectively optimize over the alignment and user datasets. Unfortunately, our subsequent study shows that this simple Bi-State Optimization (BSO) solution experiences convergence instability when steps invested in its alignment state is too small, leading to downgraded alignment performance. By statistical analysis, we show that the \textit{excess drift} towards the switching iterates of the two states could be a probable reason for the instability. To remedy this issue, we propose \textbf{L}azy(\textbf{i}) \textbf{s}afety \textbf{a}lignment (\textbf{Lisa}), which introduces a proximal term to constraint the drift of each state.
The King of the Dinosaurs was NOT a genius! Scientists pour cold water on theory that T.Rex was as intelligent as a monkey - and say it was 'more like a smart crocodile'
With its ruthless ability to hunt down prey, there's no denying that Tyrannosaurus rex was a clever beast. But the famous dinosaur, which died out 66 million years ago, couldn't match today's primates for intelligence, a new study shows. Researchers have poured cold water on the claim by a neuroscientist last year that T.Rex possessed'baboon-like' cognitive abilities and was capable of problem-solving. The controversial claim, immediately greeted with skepticism in the scientific community, has now been debunked. Instead, T.Rex's brain power was more like that of today's reptiles, such as crocodiles and lizards, the researchers argue.
- North America (0.06)
- Europe > Germany > Hesse > Darmstadt Region > Frankfurt (0.05)
- Health & Medicine > Therapeutic Area > Neurology (0.72)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.55)
AI can spot early signs of Alzheimer's in speech patterns, study shows: Newsroom - UT Southwestern, Dallas, Texas
DALLAS – April 12, 2023 – New technologies that can capture subtle changes in a patient's voice may help physicians diagnose cognitive impairment and Alzheimer's disease before symptoms begin to show, according to a UT Southwestern Medical Center researcher who led a study published in the Alzheimer's Association publication Diagnosis, Assessment & Disease Monitoring. "Our focus was on identifying subtle language and audio changes that are present in the very early stages of Alzheimer's disease but not easily recognizable by family members or an individual's primary care physician," said Ihab Hajjar, M.D., Professor of Neurology at UT Southwestern's Peter O'Donnell Jr. Brain Institute. Researchers used advanced machine learning and natural language processing (NLP) tools to assess speech patterns in 206 people – 114 who met the criteria for mild cognitive decline and 92 who were unimpaired. The team then mapped those findings to commonly used biomarkers to determine their efficacy in measuring impairment. Study participants, who were enrolled in a research program at Emory University in Atlanta, were given several standard cognitive assessments before being asked to record a spontaneous 1- to 2-minute description of artwork.
- Research Report > Experimental Study (0.71)
- Research Report > New Finding (0.52)
AI can spot early signs of Alzheimer's in speech patterns, study shows
New technologies that can capture subtle changes in a patient's voice may help physicians diagnose cognitive impairment and Alzheimer's disease before symptoms begin to show, according to a UT Southwestern Medical Center researcher who led a study published in the journal Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring. "Our focus was on identifying subtle language and audio changes that are present in the very early stages of Alzheimer's disease but not easily recognizable by family members or an individual's primary care physician," said Ihab Hajjar, M.D., Professor of Neurology at UT Southwestern's Peter O'Donnell Jr. Brain Institute. Researchers used advanced machine learning and natural language processing (NLP) tools to assess speech patterns in 206 people--114 who met the criteria for mild cognitive decline and 92 who were unimpaired. The team then mapped those findings to commonly used biomarkers to determine their efficacy in measuring impairment. Study participants, who were enrolled in a research program at Emory University in Atlanta, were given several standard cognitive assessments before being asked to record a spontaneous 1- to 2-minute description of artwork.
ChatGPT influences users' judgment more than people think
Researchers at TH Ingolstadt and the University of Southern Denmark have studied the effects of AI opinions on humans. Their study shows that machine-generated moral perspectives can influence people, even when they know the perspective comes from a machine. In their two-step experiment, the researchers first asked ChatGPT to find solutions to different variants of the trolley problem: Is it right to sacrifice the life of one person to save the lives of five others? The researchers received different advice from ChatGPT. Sometimes the machine argued for human sacrifice, sometimes against.
- Europe > Germany > Bavaria > Upper Bavaria > Ingolstadt (0.26)
- Europe > Denmark > Southern Denmark (0.26)